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ABSTRACT 

The clustering of dark halos depends not only on their mass but also on their assembly 
history, a dependence we term 'assembly bias'. Using a galaxy formation model grafted 
onto the Millennium Simulation of the ACDM cosmogony, we study how assembly bias 
affects galaxy clustering. We compare the original simulation to 'shuffled' versions 
where the galaxy populations are randomly swapped among halos of similar mass, 
thus isolating the effects of correlations between assembly history and environment at 
fixed mass. Such correlations are ignored in the halo occupation distribution models 
often used populate dark matter simulations with galaxies, but they are significant in 
our more realistic simulation. Assembly bias enhances 2-point correlations by f 0% for 
galaxies with Mb j— 5 log h brighter than —17, but suppresses them by a similar amount 
for galaxies brighter than —20. When such samples are split by colour, assembly bias 
is 5% stronger for red galaxies and 5% weaker for blue ones. Halo central galaxies are 
differently affected by assembly bias than are galaxies of all types. It almost doubles 
the correlation amplitude for faint red central galaxies. Shuffling galaxies among halos 
of fixed formation redshift or concentration in addition to fixed mass produces biases 
which are not much smaller than when mass alone is fixed. Assembly bias must reflect 
a correlation of environment with aspects of halo assembly which are not encoded 
in either of these parameters. It induces effects which could compromise precision 
measurements of cosmological parameters from large galaxy surveys. 
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1 INTRODUCTION 



In a recent study iGao et al.l J2005l hereafter GSW05) 
showed that the clustering of dark matter halos can de- 
pend strongly on their formation redshift. Many current 
galaxy clustering models adopt simplified prescriptions for 
populating halos with galaxies based on an implicit as- 
sumption which is inconsistent with this result, namely that 
the assembly history of a halo of given mass (and thus its 
galaxy content) is statistically independen t of its larger scale 
environment (e.g. [K^i^m^nr^^I] ^9j: [Jin^^^lJ^^ 
Peacock fc Smith"200(l iBenson et al.l 120001: iBerlind et all 
2003; Yang et al. ,200a ). GSW05 found that those halos 
with Mvir < lO^^M© that assembled at high redshift are 
substantially more clustered than halos of similar mass 
that assembled more recently. Earli er studies had missed 
the strength of this dependence (e.g. iLemson fc KauffmannI 
[l999.: .Sheth fc Tormen..2Q04.'l . apparently because they were 
based on simulations of insufficient size and resolution to 
reliably reach the relevant regime. Following GSW05, other 
authors have demonstrated significant dependences of clus- 
tering on halo properties such as concentration and subhalo 



occupation number, which are strongly correlated with for - 
mation redshift (e.g. IWechsler et alji20o3lZhu et al.ll2006h . 

If the assembly history of dark matter halos is corre- 
lated with their large-scale environment, we may expect the 
same to be true for their galaxy content. This will then affect 
the large-scale clustering of galaxies in a way which depends 
on how galaxy properties are established during halo as- 
sembly, i.e. on the physics of galaxy formation. A number 
of recent studies have addressed this question, approaching 



(e.g. |Yoo et al.l 


200e 




lAbbas & ShethI 


200f 





Harker et aP l2006l: lYang et al.l 12006 
Reed et al ll200d) . lYang et alJ ll2006l 



find that, at fixed mass, the clustering of galaxy groups cor- 
relates quite strongly with the st ar formation rate o f the 
central gala xy. On the other hand, ISkibba et alj i200d) and 
lAbbas fc Sh eth (2Qoi) found the clustering in their analy- 
sis of SDSS data to be consistent with models with no de- 
pendence of halo galaxy populations on halo environment. 
IYoo et all i2006l) randomly shuffle galaxies between halos of 
similar mass in a small volume simulation (box side length 
50/i~^Mpc) and find 5 — 10% effects that are at the level 
of the statistical uncertainty of their calculation. Much of 
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this work was a response to the presentation of early results 
from the present project during summer 2005^ . 

The Halo Occupation Distribution (or HOD) method 
for predicting galaxy clustering has become popular be- 
cause it bypasses the need to model the physics of galaxy 
formation when analysing the spatial distribution of galax- 
ies on large scales, for example to constrain the shape and 
amplitude of the primordial spectrum of density fluctua- 
tions. In 'classic' HOD models the galaxy population of a 
halo depends on its mass alone. This makes it possible to 
marginalise over the parameters describing possible occupa- 
tion distributions in order to constrain more "fundamental" 
quantities. Many intended applications require precise mea- 
surements and realistic error estimates, so it is important 
to quantify any systematic uncertainties introduced implic- 
itly by the HOD method. Future large-scale surveys hope 
to clarify the nature of Dark Matter and of Dark Energy 
through percent-level measurements of the clustering of very 
large numbers of galaxies at both low and high redshift (e.g. 
PanSTARRS jKaiser et alj2002l) or the Dark Energy Survey 
l|Abbot et al.ll2005l) '). Interpretation will require theoretical 
models with uncertainties significantly below this level. 

In this paper, we quantify how correlations between 
halo environment and halo assembly history affect galaxy 
clustering. We use a simulation of the formation and evo- 
lution of the galaxy population within a very large re- 
ion (0.125fe~'^Gpc' ^) in the concordance ACDM cosmogony 
Croton et al ] |2006l) . This simulation was carried out by in- 
tegrating simplified equations for the evolution of the bary- 
onic component within a stored representation of the evolv- 
ing dark matter distribution of the largest high-resolution 
cosmological simulation c arried out to date, t he so-called 
'Millennium Simulation ' JSprineel et al.ll2005^ . Since our 
galaxy formation modelling explicitly follows the assembly of 
each dark matter structure, it automatically takes care of ef- 
fects induced by correlations of halo environment with halo 
assembly history. We test whether such effects are signifi- 
cant by randomly swapping the galaxy populations of halos 
of identical mass. Such shuffling does alter galaxy clustering 
on large scales, although it would not if the 'classic' HOD as- 
sumption were correct. Our results should reliably indicate 
the characteristic strength of such systematic effects, even if 
our specific galaxy formation model is later superseded. 

The outline of this paper is as follows. In Section |5| we 
briefly introduce our simulation and our galaxy formation 
model. Section |21 then describes in detail how we shuffle 
galaxy populations between halos of similar properties (i.e. 
similar mass, but perhaps also similar concentration or for- 
mation redshift). The differences in clustering between the 
galaxy distribution in the original simulation and those pro- 
duced by such shuffling are explored as a function of the 
luminosity and colour of galaxies in Section 2] This quan- 
tifies the systematic errors to be expected in HOD models 
and addresses the issue of whether they can be reduced by 
making the HOD depend on additional halo properties. We 
conclude in Section |K| with a brief discussion and summary. 



^ http:/ /www. mpa-garching.mpg.de/~swhite/talk/NNG05.pdf 



2 SIMULATION DATA 



The Millennium Simulation follows evolution in the distri- 
bution of just over 10 billion dark matter particles in a 
periodic box of side 500/i~^Mpc. The mass per particle is 
8.6 X 10* /i~^M0. The adopted cosmological parameter val- 
ues are = 0.75, = ^dra + Sib = 0.25, Sib = 0.045, 
h = 0.73, and ag, = .9, consistent with a combined analysis 
of th e 2dFGRS llCoiless et al. 2001) and first year WMAP 
data llSpereel et al.ll2003l:iSeriak et all2 005l. The dark mat- 
ter distribution is stored at 64 times spaced approximately 
logarithmically in expansion factor at early times, and at ap- 
proximately 300 Myr intervals after z = l. Friends-of- friends 
(FOF) halos are identified in the simulation at each stored 
output with a linking length 0.2 times the mean particle 
separation. Substructure is then identified within each halo 
using an im proved and ext e nded version of the SUBFIND 
algorithm of ISpringel et all ll200ll) . Having found all halos 
and their subhalos at all output times, hierarchical merg- 
ing trees are constructed which describe in detail how each 
structure grows as the universe evolves. These trees are iden- 
tical to those used by GSW05 and are the representation of 
the evolving dark matter distribution within which the sim- 
ulation of galaxy formation is carried out. Further details of 
the dark matter simula t ion an d of these procedures can be 
found in lSprineel et alJ 



Our simulation of the formation and evolution of the 
galaxy population follow s the methodology introduced by 
lKaufi:mann et al.l ilQQST) and extended by ISoringel et alJ 
ll20^F ~Virialised dark matter halos at each redshift are as- 
sumed to have collapsed with their "fair" share of baryons 
(e.g. Slb/Slm times their total mass) from which galaxies 
form and evolve. The simulation follows the evolution of 
the galaxy population in each merger tree. It includes a 
wide range of galaxy formation physics using simple, phys- 
ically based models tuned to represent both relevant obser- 
vational data and more detailed simulations (for detail see 
ICroton et al Importantly, it is the detailed merging, 

accretion and disruption histories of the dark matter halos 
and their substructures that drive the baryonic modelling 
and t hus ultimately deter mine the galaxy co ntent of the 
halos. ICroton et all (jgOOi) and ISoringel et"ai] iBoOS.) show 
that this two-stage simulation scheme can produce a galaxy 
population consistent with many observed properties of the 
local population. These include the galaxy luminosity func- 
tion, the bimodal distribution of colours, the morphology 
distribution, the TuUy-Fisher relation, and 2-point galaxy 
correlation functions for samples selected by luminosity and 
type. However, thi s mode l is no t, of course, perfect. For 
example, Weininan n et alj ll2006l) show that it incorrectly 
predicts some aspects of the colour distribution of satellite 
galaxies in group-sized halos, and this may impact measure- 
ments that are dependent on colour selection. To minimise 
such uncertainties we will always consider relative measures 
of bias between shuffled and unshuffled catalogues to indi- 
cate the expected size of the assembly bias effect. Due to 
the large volume of the Millennium Simulation our simu- 
lated galaxy catalogue is unprecedented in size, containing 
5 178 238 galaxies brighter than Mb, — 51og/i = —17. 
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3 THE SHUFFLING TECHNIQUE 

If galaxy populations within dark matter hales of a given 
mass are statistically independent of all halo properties 
other than mass, as assumed in the simplified clustering 
models described in the Introduction, then galaxy clustering 
should not depend on how the individual realisations of the 
satellite-central galaxy population are distributed among 
the various halos of that mass. We test this by compar- 
ing galaxy correlation functions estimated from the cata- 
logue described in Section|5|with identically defined correla- 
tion functions estimated from "shuffled" catalogues in which 
satellite-central galaxy populations are randomly exchanged 
between halos of similar mass. If halo assembly history is in- 
deed independent of halo environment, such shuffling should 
have no effect on the estimated correlations. 

More specifically, for each EOF dark matter halo we 
first record the position off-sets of all its galaxies with re- 
spect to the 'central' galaxy. This central galaxy sits at the 
bottom of the halo's potential well, while further galaxies 
are satellites which may or may not be associated with sub- 
halos catalogued by SUBFIND. We then rank-order all ha- 
los by virial mass and divide them into mass bins of width 
logAMvir = 0.1 (although because of the rapidly decreas- 
ing number of halos in the tail of the mass function the two 
most massive bins are widened to log Mvir ~ 14.8 — 15.0 
and log Mvir = 15.0— 15.5. Note that the gradient of the as- 
sembly bias eff ect across this m ass range is small, as shown 
in Figure 3 of IWechsler et al.) iioOQl . Note also that here 
and elsewhere we define Mvir as the mass within the largest 
sphere surrounding the halo's potential minimum with mean 
enclosed density at least 200 times the critical value). We 
then randomly shuffle the galaxy populations of the ha- 
los in each bin. When doing this, we take the new central 
galaxy of each halo to have the same position as the orig- 
inal central galaxy and we determine the positions of the 
new satellites using their recorded off-sets from their central 
galaxy. Each central galaxy thus moves together with its 
own set of satellites. In the language of HOD modellers (e.g. 
ICoorav fc Shethll2002l) this procedure exactly preserves all 
1-halo contributions to galaxy clustering statistics. Any dif- 
ferences between the original and the shuffled catalogues can 
arise from 2-halo terms only. The shufHing is done 10 times 
with different random seeds to create 10 different shuffled 
galaxy catalogues. It can also be carried out among halos 
for which a second variable, such as halo formation redshift, 
has been matched in addition to halo mass. We show the 
effect of such extra constraints in Section [4.31 

To quantify the difference in clustering between the ac- 
tual and the shuffled galaxy catalogues we measure the 2- 
point autocorrelation function for each and plot their rela- 
tive bias, b{r), defined by 

b{r) = (p^Y'- (1) 

Here ^shuff is the 2-point function of the shuffled catalogue 
at pair separation r, and ^orig is the corresponding 2-point 
function for the original (unshuffled) catalogue. Note that a 
value of 6 > 1 implies that the shuffling dilutes the clustering 
of the original distribution. Note also that whenever we esti- 
mate b below, exactly the same galaxy set is use to estimate 



both ^orig and ^simff. Only the positions of the galaxies are 
changed by the shuffling. 



4 RESULTS 

4.1 The strength of second parameter effects 

In Fig. we plot the relative bias between our 10 shuffled 
galaxy catalogues and the original Millennium Run cata- 
logue as a function of pair separation, and for subsets of 
galaxies selected in various ways. In this subsection we show 
results for subcatalogues which contain only galaxies in sub- 
halos with mass (as defined by SUBFIND, see Springcl ct ^ 
2005) greater than Mvir = 5.5 x 10^°/i"^Mq (i.e. > 64 simu- 
lation particles). This means that we consider only galaxies 
which reside in well-resolved dark matter (sub)structures at 
z = 0. In the top panel, relative bias functions are shown for 
this sample as a whole an d for subsamples spl it by colour at 
B-V = 0.8 (see Fig. 9 of lCroton et alJl2006ll . The bottom 
panel presents a similar analysis but further restricts the 
catalogues to contain only the central galaxies of the halos. 
Note that for all statistics there is a very small scatter be- 
tween the 10 relative bias measurements. This demonstrates 
that 'small sample' effects are negligible for the questions we 
address here. 

Consider first the top panel of Fig.0 The galaxy popu- 
lation as a whole (the solid lines) shows a systematic bias of 
^ 3% on large scales. Shuffling has reduced the strength of 
clustering by a small but significant amount. Note that this 
result is independent of the galaxy formation model, since 
shuffling does not change the set of central galaxy positions 
but merely reassigns populations of well-resolved subhalos 
among halos of similar mass. Clearly the assembly histo- 
ries of dark halos are not independent of their clustering 
properties (as GSW05 already showed) and this does affect 
galaxy clustering^. Red galaxies (long-dashed lines) are bi- 
ased in the same way as the sample as a whole but at the 
~ 15% level, while blue galaxies (dashed-dotted lines) are bi- 
ased with the opposite sign at the ~ 5% level. These results 
do, of course, depend on the galaxy formation model which 
determines whether galaxies are red or blue. The overall 
bias is effectively a weighted average of these two partially 
compensating effects. Note that bias is negligible on small 
scales and grows to a value which is almost constant for 
r ;i, 3/i~^Mpc. This reflects the fact that only the 2-halo 
term contributes (i.e. clustering between galaxies which live 
in different halos). This is diluted on small scales by 1-halo 
clustering which is identical in all catalogues. The change 
in clustering amplitude for galaxies (here (1 — 6)^ ^30%) 
is smaller than that found by GSW05 for dark matter ha- 
los (which was up to a factor of ~ 5). This is because we 
sum clustering contributions from halos with a wide range in 
mass, thereby diluting the predominantly low-mass GSW05 
effect. 

^ A preliminary k-space analysis by Nikhil Padmanabhan using 
one of our shufHed catalogues suggests that the effects of assembly 
bias on the power spectrum amplitude are non-negligible out to 
scales of at least fe~ 0.05/iMpc~'^ (> 100/i~^Mpc), after which 
noise dominates the signal. We leave a detailed power spectrum 
analysis of assembly bias for future work. 
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Figure 1. The relative bias between the original and the shuf- 
fled galaxy populations in subhalos more massive than 5.5 X 
IO^^/i^^Mq (i.e. > 64 simulation particles) as a function of pair 
separation (Eq. Q. The top panel shows results for all galaxies, 
whereas the bottom panel is restricted to central galaxies (re- 
sulting in one and only one galaxy per halo). In each panel solid 
lines refer to the full sample, while long-dashed lines are for blue 
galaxies and dashed-dotted lines for red galaxies. The two sub- 
populations are split at B — V = 0.8. Strong bias effects are seen 
in a number of cases demonstrating that the galaxy content of a 
halo of given mass is correlated with the halo's large-scale envi- 
ronment. 



When we consider the clustering of central galaxies only 
the total number of galaxies in these catalogues is reduced 
by approximately 30% and the relative bias functions change 
considerably. By definition, there is now one and only one 
galaxy in each dark halo so there is no 1-halo contribution to 
the correlation functions. In addition, the correlation func- 
tion for the population as a whole is invariant under shuf- 
fling. Thus the solid lines in the lower panel of Fig. all 
coincide with b{r) = 1. There are, however, substantial ef- 
fects when the population is split by colour, demonstrating 
that the colour of the central galaxy in a halo of given mass 
depends significantly on the halo's environment. Halos with 
red central galaxies show a strong relative bias (~ 40% on 
large scales, rising to ~ 80% on small scales) while halos 
with blue central galaxies show a weaker one which is very 
similar to that for all blue galaxies (^^5%). The strong effect 
for red central galaxies reflects the fact that such objects are 
found primarily in two very specific types of halo: massive 
halos where cooling and star-formation have been curtailed 
by AGN feedback; and lower mass halos which have just 
passed through a more massive system, thereby losing their 
hot gas atmospheres and so their source of fuel for star for- 
mation. Both cases are associated with a massive halo, hence 
the high clustering amplitude. The great majority of cen- 



tral galaxies are associated with more isolated and/or lower 
mass halos and have ongoing star formation; these objects 
are blue. 



4.2 Assembly bias as a function of galaxy 
luminosity 

We now generalise the above results including all galaxies 
which are well resolved by the formation model regardless 
of their subhalo mass at z — 0. Fig. |21 shows the relative 
bias between the shuffled and the original galaxy popula- 
tions as a function of both colour and luminosity. On scales 
r iZy 3/i~^Mpc 1-halo terms do not contribute to the corre- 
lations and the relative bias is approximately constant for 
all samples we have considered. For simplicity we therefore 
average the relative bias measurements for each of our 10 
shuffled catalogues over the separation range 6-12/i~^Mpc 
and we characterise the result by the mean and la scatter 
of these values. In the following we refer to this quantity 
as the assembly bias as it measures the bias induced by the 
environmental dependence of halo assembly history at fixed 
halo mass. 

The top panel of Fig. |5|shows this assembly bias for ab- 
solute magnitude limited subsamples of galaxies as a func- 
tion of their magnitude limit. Again we plot results for galax- 
ies of all colours (solid line) and for blue (dot-dashed line) 
and red (long-dashed line) galaxies separately. The bottom 
panel shows an identical analysis but for samples restricted 
to central galaxies. Note that selecting galaxy subsamples by 
limiting stellar mass rather than luminosity produces similar 
behaviour to that presented below. This is expected given 
that the scatter in log(M/L) for the galaxies is typically 
small in comparison with the magnitude range over which 
the assembly bias changes. 

If we focus first on the upper panel of Fig. |5| we see 
that correlations between assembly history and environ- 
ment at fixed halo mass can either enhance (for faint galax- 
ies) or dilute (for bright galaxies) the strength of galaxy 
clustering, with a transition near the characteristic lumi- 
nosity L, of the galaxy luminosity function. Fainter than 
Mbj — 5 log h ~ —20.5 bias values for the red and blue sub- 
populations are symmetrically offset from the curve for the 
population as a whole by about 5%. Brighter than this, the 
bias for the population as a whole approaches that for the 
red subpopulation, reflecting the fact that there are few blue 
galaxies at these magnitudes. At Mbj — Slog/i ~ —20 blue 
galaxies have an assembly bias of about 0.9, showing that 
they occupy halos with signiflcantly lower density environ- 
ments than randomly selected halos of the same mass. 

In the bottom panel of Fig. |21 we show the assembly 
bias for absolute magnitude limited samples of central galax- 
ies (i.e. for samples of halos defined by the luminosity and 
colour of their central galaxies). A notable difference from 
the central galaxy samples studied in the bottom panel of 
Fig. (which were defined by the mass of their halos) is 
that the assembly bias differs from unity not only for the 
red and blue subsamples but also for samples without colour 
selection. This difference is caused by scatter in the relation 
between halo mass and central galaxy luminosity which cor- 
relates with halo environment in a way that is different for 
halos with faint (L<L,) and with bright {L>Lt) central 
galaxies. Low-mass halos with brighter than average central 
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Figure 2. The assembly bias (i.e. the enhancement in clustering 
induced by correlations between halo assembly history and large- 
scale environment at fixed halo mass) as a function of magnitude 
limit for absolute magnitude limited samples of galaxies. The up- 
per panel gives results for all galaxies and the lower panel for 
central galaxies only. In each panel the solid line gives results for 
galaxies of all colours, while the long-dashed and dashed-dotted 
lines are for blue and red subsamples respectively. The two sam- 
ples are split at B — V = 0.8. The grey shaded region surrounding 
each line indicates the la scatter in assembly bias values for our 
10 shuffled catalogues. Significant bias (i.e. 6^1) is seen for al- 
most all galaxy subsamples. 



galaxies are in denser than average environments, while the 
opposite is true for higher mass halos. At all magnitudes 
blue central galeixies inhabit halos with lower density envi- 
ronments than red central galaxies. This is in part because 
at given absolute magnitude blue central galaxies tend to 
have lower mass halos than red ones. 

From Fig. |21 we see that assembly bias is strongest for 
faint red central galaxies. These galaxies reside at the cen- 
tres of low-mass (~ IO^^Mq) dark matter halos and have 
a 6 value of about 1.4, which translates to an autocorre- 
lation amplitude about twice that which would have been 
found if their halos had been randomly chosen according 
to their mass alone. The mean formation redshift of ha- 
los with — 17 > Afbj — 51og/i > — 18 red central galaxies is 
Zform ~ 2.9, whereas blue central galaxies of similar magni- 
tude have halos with Zform ~ 1-8. As noted above, many of the 
faint red central galaxies occupy halos which have recently 
passed through a much more massive system. As a result 
they have both high formation redshifts and high density en- 
vironments. This accounts for much of their strong assembly 
bias. The bias for all faint red galaxies (see the top panel) 
is less pronounced due to dilution by satellites in group and 
cluster mass halos. As GSW05 showed, the correlation be- 
tween assembly history and environment is much weaker for 
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Figure 3. Assembly bias for absolute magnitude limited samples 
of red and blue central galaxies (i.e. for halos with red or blue cen- 
tral galaxies brighter than a specified limit). Results are shown for 
three diff'crent implementations of the shuffiing procedure of Sec- 
tion swapping among halos of similar virial mass (replotted 
from Fig. 1^ solid lines); swapping among halos of similar virial 
mass and similar concentration (dashed-dotted lines); and swap- 
ping among halos of similar virial mass and similar formation time 
(long-dashed lines). The grey shaded region surrounding each line 
indicates the la scatter among 10 shuffled catalogues. Assembly 
bias is somewhat weaker when halo concentration or formation 
time is specified in addition to mass, but it is not eliminated. 



these than for low-mass halos. In contrast, the blue galaxy 
curves are similar in the top and bottom panels: this is sim- 
ply because most blue galaxies are central galaxies. Bright 
red central galaxies are largely unaffected by assembly bias 
because they have high mass halos. Such halos almost never 
have blue central galaxies and for them the GSW05 effect 
is, in any case, weak. 

The extent to which assembly bias effects are due to 
the properties of satellite galaxies rather than to those of 
central galaxies can be tested by shuffling satellite popu- 
lations as before while keeping all central galaxies in their 
original positions. We have done this, finding relatively weak 
effects. If we shuffle satellites only, we find an assembly bias 
(Corig/Cshuff)^''^ ~ 1.02 to all magnitude limits for the full 
galaxy population. For the red galaxies, (^orig/Cshutr)^''^ ~ 
1.04, while for the blue galax;ies there is no significant effect, 
(Corig/Cshuff)^^^ ~ 1.0. The fact that shufHing satellites alone 
changes the correlation amplitude suggests that the satel- 
lite population of a halo somehow "knows" about its large- 
scale environment, in the sense that halos with many neigh- 
bours tend to have more substructure than similar mass ha- 
los with few. A weak tendency for halos with neighbours 
to have above-a.verag e substructure was already noted by 
IWechsler et~ai] tood) . 



4.3 A second variable? 

The above results show that populating dark halos with a 
semi-analytic or HOD algorithm based on halo mass alone 
will result in correlation functions which have systematic 
errors between 5% and a factor of 2 depending subsam- 
ple definition. We now ask whether more complex algo- 
rithms which include dependences on additional halo prop- 
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Figure 4. The distribution of halo formation redshifts for two ranges of halo mass as indicated. Solid lines show the distribution for 
all halos in each mass range and are the same in both panels. Dotted lines in the left panel show distributions for the 30% of halos in 
each mass range with the faintest central galaxies, while dashed lines are for the 30% tail with the brightest central galaxies. Dotted and 
dashed lines in the right panel are similar except they now refer to the halos with the bluest and reddest central galaxies respectively. 
For lower mass halos the colour, and to some extent also the luminosity, of the central galaxy is correlated with formation redshift. At 
high masses such effects may also be present but our results are noisier because of the much smaller number of halos involved. 



erties can account for the assembly bias in our simulated 
catalogues. At some level this must be possible since the 
galaxy content of each simulated halo is determined by its 
detailed assembly history. It is unclear, however, whether 
this history is suitably summarised by parameters such as 
formation redshift or conce ntration. ^N avarro et al. (1996), 
IWechsler et al.1 (|2002) and iGao et alJ ll2004l have shown 
that both these properties are closely related to the growth 
history of a halo's main progenitor.) To explore this issue we 
consider two more highly constrained shuffling procedures: 
swapping galaxy populations between halos of similar mass 
and similar formation redshift, and between halos of similar 
mass and similar concentration. In the following, formation 
redshift is defined as the redshift when a h alo's main progen - 
itor contains half its final mass, as used bv lGao et alj ll2004ll 
and GSW05. We linearly interpolate halo growth between 
simulation outputs to increase the precision with which this 
redshift can be estimated. To estimate halo concentration 
we take the measured Vyir an d Vmax for each halo and solve 
Eq. 5 in lNavarro et alJ Jl996ll . 

In Fig.l^we show the result of this exercise. We plot rel- 
ative bias as defined above (again estimated from an ensem- 
ble of 10 shuffled catalogues) for absolute magnitude limited 
central galaxy subsamples split by colour. These were the 
subsamples with the most pronounced effects in Fig. |5|and 
the results from that figure are repeated here as solid lines. 
The other lines show how the relative bias is reduced when 
shuffling preserves halo formation redshift or halo concentra- 
tion in addition to halo mass, thus how much of the assembly 
bias in the original simulated catalogue can be represented 
using these additional halo properties. Note that results for 
the "all colour" and "all galaxy" catalogues presented in 
Figure 2 show dependences on these additional parameters 
that are much weaker than the extreme cases shown here. 



typically less than a few percent. We thus omit them for 
clarity. 

Interestingly, Fig. |21 shows that neither formation red- 
shift nor concentration encodes sufflcient information to ac- 
count for the assembly bias of the simulated galaxy cata- 
logue. Of the two parameters, formation redshift is the most 
successful, accounting for about 40% of the assembly bias 
for faint red central galaxies (at Mb, — 51og/i = —17 the 
relative bias is reduced from 1.37 to 1.22) but only a few 
percent of the assembly bias for bright blue central galaxies. 
Employing concentration as the second parameter is only 
about half as effective in reducing the relative bias. Concen- 
tration dependences can account for only a small fraction 
of the measured assembly bias. Clearly, although the galaxy 
content of our simulated halos depends only on their mass 
and their assembly history, there is some aspect of the as- 
sembly history which is not encoded in halo concentration or 
formation redshift and which correlates with large-scale en- 
vironment. Fig.l^demonstrates that halo concentration and 
halo formation redshift do not encode the same information 
about large-scale clustering and that neither provides the 
information needed for precise modelling of the large-scale 
clustering of galaxies. 

Fig- m explores the relation of central galaxy properties 
to formation redshift in more detail. It shows the distribu- 
tion of formation redshift for dark matter halos in two mass 
ranges, log Mvir> 14.0 and log Mvir = 10.9 - 11.1 Qi'^Mq). 
In each range we compare the distribution for all halos (solid 
lines) with those for subpopulations which are extreme in 
their central galaxy properties, either luminosity or colour. 
Dashed and dotted curves in the left panel correspond to 
the 30% tails containing the brightest and the faintest cen- 
tral galaxies respectively, while in the right panel they refer 
instead to the 30% bluest and reddest central galaxies. 
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Fig. |l]shows the well known result that low-mass halos 
have a broad distribution of formation redshifts, centred at 
z ~ 1.7 and extending from z > S to z < 0.5, while high- 
mass halos formed more recently, with a distribution cen- 
tred at z 0.7 and with tails extending from z ~ 0.0 to 
zr^ 1.5 fsee lLacev fc Coldll993D . For low-mass halos there 
is a weak but significant shift in formation redshift distribu- 
tion between those with bright and those with faint central 
galaxies: faint central galaxies live in halos that formed sys- 
tematically earlier than those of their brighter counterparts. 
(In this mass bin the mean absolute magnitudes of the 30% 
faintest and 30% brightest central galaxies are —17.2 and 
— 17.9, respectively). The converse appears true in high-mass 
halos: brighter central galaxies tend to occupy halos that 
formed earlier, while fainter central galaxies occupy halos 
that formed later. (The mean absolute magnitudes of the 
30% faintest and brightest central cluster galaxies are —21.1 
and —22.1 respectively). 

For low-mass halos the effects as a function of colour 
are significantly stronger. The right panel of Fig0] shows 
that halos with red central galaxies have significantly earlier 
formation redshifts than those with blue. The mean B — V 
colour of the central galaxy shifts from 0.44 to 0.69 between 
the two tails. For high-mass halos there is little difference 
in the formation redshift distribution between those with 
the reddest and those with the bluest central galaxies. This 
is simply because most of the central galaxies in high-mass 
halos are red; the shift in colour between the two tails is 
only from 0.90 to 0.94 in this case. In combination with the 
effect discovered by GSW05, the correlation between central 
galaxy colour and halo formation redshift in low-mass halos 
explains a significant part (roughly half) of the large-scale 
assembly bias which we measure for faint red central galaxies 
(see Fig.OJ. 

5 DISCUSSION AND CONCLUSIONS 

We ask a simple question in this paper: to what degree is 
galaxy clustering infiuenced by the assembly bias of dark 
halos, the fact that their clustering depends not only on 
their mass but also on the details of their assembly history? 
Such dependences are neglected in the halo occupation dis- 
tribution schemes that have become popular for constructing 
galaxy catalogues from dark matter simulations. Our results 
show that they are significant, however, and thus may intro- 
duce systematic errors if HOD techniques are used to derive 
cosmological parameters from the clustering in large galaxy 
surveys. It appears that this problem is not easily addressed 
by including an additional halo parameter in HOD models. 
Detailed tracking of galaxy formation during halo assembly 
seems necessary. 

Our conclusions are based on analysis of galaxy clus- 
tering in a very large simulation in which the formation 
of galaxies has been followed explicitly during halo assem- 
bly. By comparing galaxy catalogues drawn from this sim- 
ulation with 'shuffled' catalogues where galaxy populations 
have been swapped among halos of similar properties, we can 
measure the sensitivity of galaxy clustering to the details of 
halo assembly. Our principal results can be summarised as 
follows: 

• Assembly bias can be significant and can be of either 



sign. The effects differ qualitatively between galaxy samples 
selected above a halo mass limit and those selected above a 
galaxy luminosity limit, as well as between samples contain- 
ing all galaxy types and those containing only the central 
galaxies of halos. In addition they depend on galaxy colour. 
The strongest effects are found for low-luminosity, red cen- 
tral galaxies. Assembly bias enhances the amplitude of their 
2-point correlations by almost a factor of 2. 

• Simulation galaxies selected to a faint absolute magni- 
tude limit (e.g. A/b, — 51og/i < —17) are more strongly clus- 
tered than they would be if halos of a given mass had galaxy 
populations distributed independently of other halo proper- 
ties. This effect reverses for samples selected above a rela- 
tively bright absolute magnitude limit (e.g. Mbj — 5 log h < 
—20). In both cases the bias alters the amplitude of the 2- 
point correlation function of the galaxies by up to 10%. 

• When absolute magnitude limited galaxy samples are 
split by colour at B — V = 0.8, the blue and red subsamples 
have values of assembly bias which are off-set from the value 
for their parent samples by —5% and +5% respectively, cor- 
responding to 10% off-sets in autocorrelation amplitude. 

• As expected from the results of GSW05, halo forma- 
tion redshift encodes some of the effects leading to assembly 
bias. Surprisingly, however, allowing the galaxy populations 
of halos to depend on halo formation redshift in addition to 
halo mass accounts for only 40% of the assembly bias for 
red central galaxies and has no inffuence on that for blue 
central galaxies. Halo concentration is even less successful 
as a second parameter, only half as effective as formation 
redshift in accounting for assembly bias in the simulation. 
Most of this bias must be due to a correlation between other 
aspects of halo assembly and halo environment. 

• As is well known, dark matter halos of a given mass 
show a wide range of formation redshifts. For given mass, the 
distribution of formation redshift depends both on the colour 
and on the luminosity of the central galaxy. The effects are in 
most cases quite weak, however, reinforcing the impression 
that other aspects of halo formation must be responsible for 
the strong assembly bias we find for colour- and absolute 
magnitude selected samples of central galaxies. 

New larg e -scale galaxy surveys, such as PanSTARRS 
Kaise^^al J2002|) and the Dark Energy Survey 
I Abbot et al l2005h . are currently being designed to 
obtain extremely precise measures of galaxy clustering at 
a variety of redshifts. The goal is to use these to infer the 
linear power spectrum of density fiuctuations, the rate at 
which it grows with redshift, and the recent expansion 
history of the Universe. These quantities then constrain 
the nature of Dark Matter, the nature of Dark Energy, and 
the process which created all cosmic structure. Significant 
conclusions will require measures of galaxy clustering to be 
translated into estimates of more fundamental cosmological 
parameters (e.g. the amplitude of linear fluctuations at 
each redshift, the characteristic scale of baryon wiggles, the 
effective slope of the primordial power spectrum...) with a 
precision of a few percent or better. The stated goal for the 
HOD machinery is to convert observed clustering measures 
to fundamental quantities at this level of precision, while 
bypassing the need to understand the details of galaxy 
formation. Our results suggest that the details of galaxy 
formation do affect clustering statistics at at least the 5% 
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level in a way which cannot easily be included in an HOD 
model. 

It is important to stress that we do not claim that our 
galaxy formation model is correct, just that it is plausible, 
and so can be used to explore the size of assembly bias ef- 
fects. For many applications a systematic error in galaxy cor- 
relation amplitudes at the 5 or 10 percent level can safely 
be ignored. Nevertheless, if we wish to understand galaxy 
formation we must clearly model it. The results presented 
in this paper show not only that galaxy formation influ- 
ences large-scale clustering in unexpected ways which are 
not consistent with current simplified clustering models, but 
also that these models may be subject to systematic errors 
which make them unsuitable for interpreting precision mea- 
sures of galaxy clustering in terms of fundamental physics. A 
deeper understanding of galaxy formation appears required 
to carry through this programme successfully. 
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